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Abstract 

Environment has long been known to play an important part in disease etiology. However, not many genome- 
wide association studies take environmental factors into consideration. There is also a need for new methods to 
identify the gene-environment interactions. In this study, we propose a 2-step approach incorporating an influence 
measure that capturespure gene-environment effect. We found that pure gene-age interaction has a stronger 
association than considering the genetic effect alone for systolic blood pressure, measured by counting the 
number of single-nucleotide polymorphisms (SNPs)reaching a certain significance level. We analyzed the subjects 
by dividing them into two age groups and found no overlap in the top identified SNPs between them. This 
suggested that age might have a nonlinear effect on genetic association. Furthermore, the scores of the top SNPs 
for the two age subgroups were about 3times those obtained when using all subjects for systolic blood pressure. 
In addition, the scores of the older age subgroup were much higher than those for the younger group. The results 
suggest that genetic effects are stronger in older age and that genetic association studies should take 
environmental effects into consideration, especially age. 



Background 

Gene-environment interactions (G x E) have long been 
known to play an important role in complex disease 
etiology. Understanding these will reduce the bias in vari- 
able selection because of different cohort exposure to 
theenvironment [1]. Previous methods of studying G x E 
effects have mainly included candidate genes, case-only 
design, and family-based association studies [1,2]. These 
methods have made their respective assumptions in 
terms of biological knowledge, independence of gene and 
environment, and kinship information. There is an 
urgent need for new methods to detect gene-environ- 
ment effects. With the emergence of genome-wide asso- 
ciation studies (GWAS), data mining methods, such as 
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generalized linear models incorporating G x E terms, are 
becoming popular [3,4]. We do not know,however, how 
much of the association identified is a result of main 
effects and how much is a result ofpure G x E interac- 
tions. In this study, we used a 2-step method that, first, 
aggressively removed main effects from both gene and 
environment, and then tested for the strength of pure G 
X E interaction. We found that, for systolic blood pres- 
sure (SBP), the pure gene-age interaction was stronger 
than the main effect of single-nucleotide polymorphisms 
(SNPs) alone. We also analyzed the genetic association 
separately in two age groups to test the effect of age. We 
found that the marker profiles were quite distinct in dif- 
ferent age cohorts. This suggested that age might have a 
strong nonlinear effect on genetic association. 
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Methods 

Dataset 

The dataset adopted in this study was provided by 
Genetic Analysis Workshop 18 (GAW18), for which real 
phenotypes and genotypes from the San Antonio Family 
Studies are used. We focused on chromosome 3, which 
includes 62,915 SNPs. There were 142 unrelated indivi- 
duals, for whom information is available on SBP, diasto- 
lic blood pressure (DBP), age, smoking status, and use 
of antihypertension medication. Although there were 
41ongitudinal measurements of the phenotypes, we con- 
sidered only the first measurement, which had the few- 
est missing values. After removing the missing values, 
the data for 130 unrelated individuals were retained for 
further analysis. 

Detecting pure G x E effects 

Step 1: Removal of main effect of gene and environment 

For each SNP and an environmental factor, we remove 
their main effects on y by taking the residual (res) of 
projection pursuit regression (PPR) [5]. The PPR 
smooths the regression surface following an additive 
model of (nonlinear) smoothing functions (S) based on 
a linear combination of predictors {(Xm-x), expressed as 
follows: 

m=l 

where Sq.,„ is the w"^ smooth function of any linear 
combinations of x. Because PPR does not assume linear 
relation of the predictors, both nonlinear and linear 
effects can be removed from the residual. It is calculated 
using R package ppr in a stepwise manner by first 
removing the main effect of the environment, and then 
the main effect of the gene, without considering the 
interactions among them. 

Step 2: Evaluation of interactions by an influence measure 

An influence measure wasintroduced by Lo and Zheng 
[6] to capture the interaction effects based on partitions 
by a variable subset. It has been shown to be very effec- 
tive in capturing joint effects, even when main effects 
are weak. Important SNPs were found for inflammatory 
bowel disease and confirmed by later experimental 
results [7]. This also worked in a classification algorithm 
that achieved the lowest error rates in predicting several 
cancer datasets [8]. 

Assuming that we have discrete explanatory variables, 
for a given subset of variables (either gene-gene [G x G] 
or G X E), a partition of the observations can be created. 
For example, if and X2 take values of either 0 or 1, we 
will have a partition of four cells. If the phenotype of 
interest is Y, the influence measure takes the form 



i 

where / runs through the partition cells, «, is the num- 
ber of observations in cell i, n is the total number of 
observations, y, is the local mean of phenotypes in cell /, 
and Y is the overall mean. When the partition contains 
no association information, cell mean Y, should be very 
close to the overall mean y. By contrast, when a subset of 
variables has a joint influence on Y, the difference 
between y, and y will be large. The effect will be captured 
by the squared deviation and weighted by n,^, resulting 
an elevated I-score. The proposed method complements 
main effect methods. So one can find main effect first by 
using another method, and then add the interaction fea- 
tures back. 

For each SNP and an environmental factor, the phe- 
notype of interest (7) is replaced by the residual calcu- 
lated in Step 1, resulting in: 

/ = ^ nf{resi — fes)^ 

i 

The significance of the I-score is evaluated by permu- 
tation on the phenotype of the data set 10^ times. 
Dichotomization of age 

Smoking and medication are both discrete variables. We 
dichotomize age by a 2-mean clustering method (k-means 
in R). The cutoff value was found to be 55. Thus, if age is 
>55 years, the age is coded as 1, otherwise it is coded 0. 

Nonlinear gene-age association 

Current GWAS assume that a biomarker affects disease, 
independent of age, so most SNPs identified in the litera- 
ture are those with strong association across the whole 
age range. What if some genetic effect is nonlinear with 
age: In one's youth a group of SNPs influences the phe- 
notype, whereas in old age some other group of SNPs 
takes effect? To test this hypothesis, we divided the indi- 
viduals into two groups by the same 2-mean clustering 
threshold, at age 55 years. There were 76 subjects age 
<55 years (the younger group) and 54 subjects age >55 
years (the older group). We selected the top SNPs (G 
effect) within each group by I-score and checked to what 
extentthese SNPs overlapped. 

Results 

Detecting pure gene-environment effects 

Pure gene-age association is stronger than SNP main effect 

for SBP 

Using the 2-step approach, the I-score of the pure inter- 
actions of G X E was calculated after the main effects of 
both SNP and environmental factorswere removed; 
p values were obtained by permuting the phenotype 10'' 
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times. Table 1 displays the number of SNPs, for which 
corresponding pure G x E interactions reached each sig- 
nificance level. The result for G alone appears in the last 
row of the table. Pure G x E interaction shows a strong 
association, even when the main effect has been taken 
away. Consider, for example, SBP: gene-age (G x age) 
interaction resulted in 150 SNPs with a p value <10"^ 
and 29 SNPs with a jfvalue <10"''', far more than the 
main genetic effect, which had only 41 SNPs with prvalxie 
<10"^ and 5 SNP with /rvalue <10"*. Smoke and medica- 
tion had no pure interaction effects with /rvalue <10~^. 
For comparison purposes, the main effects of E only are 
also calculated, using the F-statistics of a linear regres- 
sion model with all E terms included, which had a 
p value of 4.15 x 10"^^; the main effects of all E terms 
on DBP gave a pvalue of 9.97 x 10 

Nonlinear gene-age association 
Analysis for SBP 

The subjects were divided into two groups (older than 
age 55 years or 55 years of age and younger) and the 
I-score of SNPs within each age group was calculated 
and ranked (Table 2). There were 3very interesting 
observations: 

1. There was no overlap between the top SNPs from 
the two age groups (the first overlap occurred at the 
202nd and 92nd SNP in the two groups, respectively). 

2. The I-scores of the top SNPs in age subgroups were 
about Stimes as great as the overall I-scores calculated 
disregarding age (using all subjects) (see Table 2). We 
know that under the null hypothesis, when no associa- 
tion exists for a marker subset, the expected I-score is 1. 
The result suggests that, in this dataset, most genetic 
SNPs did not affect blood pressure uniformly across all 
age ranges. The number 1 marker rsl6851260, which 
has an I-score of 90.44 identified by pure SNP-age inter- 
action using 130 subjects, only ranked 6th in the sub- 
group of age >55 years but had a much higher I-score 
of 142.42. This means that this marker has a stronger 

Table 1 The number of SNPs reaching three levels of 



significance (via permutation) 



Significance level reached 




<io-^ 


<io-'' 


<io-= 


G X age* 


SBP 


150 
(0.24%)* 


29 
(0.46%) 


1 

(0.0016%) 




DBP 


92 
(0.15%) 


11 

(0.17%) 


4 

(0.0064%) 




SBP 


41 
(0.65%) 


5 

(0.0079%) 


1 

(0.0016%) 




DBP 


58 


6 


0 



(0.92%) (0.0095%) (0) 

The percentages of the number of significant SNPs out of total number of 

SNPs (62,915) are shown in parentheses. 

*The pure G x age interactions found by 2-step method. 

'The main effect of G by I-score. 
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Table 2 Nonlinear age effect on genetic association for 
SBP 



a. Age <55 years (76 observations) 



Rank 


SNP no 


SNP nams 


Gene 


I-score 


Overall I-score 


1 


14156 


rs9834970 


NA 


77.94 


20.70 


2 


4557 


rs159154 


BRPFl 


58.90 


19.03 


3 


12457 


rsl 2493391 


NA 


68.29 


34.50 


4 


58703 


rs2239626 


DGKG 


65.98 


10.36 


5 


269 


rsl 271 5500 


NA 


65.86 


39.59 


6 


32020 


rsl 350790 


NA 


65.69 


12.43 


7 


24555 


rsl 05 10935 


NA 


64.50 


2797 


8 


55694 


rsl 4541 49 


NA 


53.61 


14.63 


9 


24613 


rs4688557 


NA 


61.44 


11.93 


10 


24461 


rsl 51 7931 


NA 


59.60 


18.07 


b. Age >55 years (54 observations) 


Rank 


SNP no 


SNP name 


Gene 


I-score 


Overall I-score 


1 


49032 


rs9825291 


NA 


1 72.46 


53.61 


2 


25762 


rs7427984 


NA 


157.36 


57.98 


3 


36285 


rs7647147 


NA 


150.21 


93.71 


4 


50876 


rs2669973 


NA 


149.84 


59.60 


5 


1951 


rs711578 


LRRNl 


148.09 


75.97 


6 


51800 


rsl 685 1250 


NA 


142.42 


90.44 


7 


49026 


rs9875837 


NA 


139.68 


55.26 


8 


33365 


rsl 71 75829 


NA 


134.32 


41.67 


9 


52781 


rs2686110 


BDHl 


134.12 


54.28 


10 


50748 


rsl 01 651 8 


NA 


133.97 


74.78 



genetic association in the older age group and, if calcu- 
lating it using the general population, would dilute this 
marker's association effect. 

3. Moreover, for SBP, the average I-score in the older 
age group is much higher than in the younger subgroup. 
For example, using the top 10 markers, the difference is 
2.2 times. The result suggests that genetic association 
for SBP is much stronger in the older age group than in 
the youngerage group. 
Analysis for DBP 

Similar to previous results for SBP, for DBP, nonover- 
lapping top genetic SNPs were observed in the younger 
and older age groups (Table 3). The first overlapping 
top marker occurred at the 69th and 108th place in the 
two groups, respectively, which suggests that there 
might be a nonuniform genetic effect across age range. 
In addition, the association effect in older age subgroups 
is stronger than using all subjects, reflected by the 
higher I-score of the subgroup than when using all sub- 
jects. Finally, the average I-score in the older age group 
is much higher than in the younger group. As an exam- 
ple, the difference is 1.4 times for the top 10 markers. 
This shows that the genetic effect is slightly stronger in 
old age than in youth for DBP. Overall, the findings for 
DBP are consistent with those for SBP, but with weaker 
magnitude. 
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Table 3 Nonlinear age effect on genetic association for 
DBP 



a. Age <55 years (76 observations) 



Rank 


SNP no 


SNP name 


Gen6 


1 -SCO re 


Overall -score 


1 


36509 


rsl 1706066 


SIDTl 


50.64 


2346 


2 


310 


rsl 2637032 


NA 


50.50 


23.34 


3 


51841 


rsl 3094783 


NA 


46.00 


42.74 


4 


24461 


rsl517931 


NA 


45.76 


18.07 


5 


26613 


rs7620998 


EIF4E3 


44.49 


11.83 


6 


14309 


rs336597 


G0LGA4 


4442 


2246 


7 


59571 


rsl 2696583 


LPP 


44.40 


9.83 


8 


3488 


rs7628504 


GRM7 


43.66 


11.35 


9 


38122 


rs4687833 


IGSFl 1 


43.51 


13.31 


10 


14166 


rs9834970 


NA 


43.22 


20.70 


b. Age >55 years (54 observations) 


Rank 


SNP no 


SNP name 


Gene 


l-score 


Overalll-score 


1 


51531 


rsl 996264 


NA 


65.64 


40.85 


2 


51532 


rsl 0936243 


NA 


66.64 


40.85 


3 


51534 


rsl 191 8801 


NA 


66.64 


40.85 


4 


51537 


rsl 2639469 


NA 


66.64 


40.28 


5 


52553 


rs6801576 


NA 


64.79 


12.85 


6 


40969 


rsl 171 7333 


NA 


62.55 


24.00 


7 


51533 


rs10513572 


NA 


62.14 


17.60 


8 


62781 


rs2686110 


BDHl 


62.14 


54.28 


9 


17446 


rs6799581 


NA 


52.03 


31.33 


10 


40970 


rsl 3066695 


NA 


59.06 


16.33 



Discussion 

Considering SBP and DBP separately in GWAS 

Many epidemiology studies have indicated different phy- 
siology and trend of development for SBP and DBP. It 
has been reported that systolic pressure is related to the 
elasticity of the great vessels and diastolic pressure to 
peripheral resistance resulting from muscle stiffness [9]. 
Consistent with this knowledge, the important SNPs 
identified for SBP and DBP in our study had few over- 
laps, either marginally or interactively. The results sug- 
gested that it might be better to study the two 
component blood pressures separatelywhen analyzing 
hypertension. 

Considering age group separately in GWAS 

In addition to finding that pure SNP-ageinteraction was 
stronger than the main genetic effect, we also found, by 
showing that the top identified genetic SNPs were com- 
pletely different between age groups,that genetic effect 
on blood pressure was nonlinear with respect to age. 

We could estimate the probability {p value) of obtain- 
ing two nonoverlapping sets of top markers, under the 
null hypothesis that the true associated SNPs for the two 
age groups are identical. Suppose there are 200 true 
SNPs influencing SBPl on chromosome 3, and that they 



are the same for both age groups. What is the probability 
that the two groups get complete nonoverlapping true 
positives (TPs). First, we need to estimate the number of 
TPs selected for the two age groups. This could be done 
by the procedure of controlling the false discovery range 
(FDR) [10] with /^values obtained by permutations. The 
procedure says: pi^< {k/m) q*, where m is the total num- 
ber of tests, here m= 62915, is the k'^^pva.lue ranked 
from smallest to largest, and q' is the FDR. So with the 
permuted pvalues, we can estimate the FDR in the 
ArSNPs. For the younger age group, there are 18 SNPs 
with /walues <10 and the estimated FDR = 0.35. So the 
expected number of TPs = 0.65*18 =12 in the top 18 
SNPs. For the older age group, there are 64 SNPs with 
/rvalues <10 and the estimated FDR = 0.098. The num- 
ber of TPs = 0.902*64 « 58 in the top64 SNPs. The prob- 
ability of having no younger group TP in older group TP 
can be calculated using a hypergeometric probability: 




= 0.014 



If the number of true markers is assumed to be 100, 
this probability is much more significant at 10"^. 

Also, the strength of genetic association was much 
stronger in the older group than in the younger, especially 
for SBP. The results suggest that age has a nonlinear 
impact on genetic association and that the nonlinear effect 
of age should be considered in GWAS, perhaps by con- 
ducting studies in separate age groups. Becausethis study 
has a limited sample size, further research on larger num- 
bers of subjects should be conducted. 

Pure G X E interaction-identified SNPs 

The pure G x age interaction identified for SBP with /(va- 
lue reaching 10~^is rs6446285 on gene BSN. The gene is 
involved in the organization of the cytomatrix at the 
nerve terminal's active zone that regulates neurotrans- 
mitter release, and is involved in the formation of retinal 
photoreceptor ribbon synapses [11]. The 4SNP-age inter- 
actions reaching 10^ for DBP are all from gene PBRMl. 
Mutations at this locationhave been associated with renal 
cell carcinoma [12]. 

Conclusions 

This study demonstrated the strong G x E interactions 
for blood pressure. Even when main effect has been 
removed, pure G x E effect could be stronger than using 
main effect alone for SBP. The study also preliminarily 
explored the nonlinear age effect on genetic association 
and confirmed the hypothesis that some SNPs had a 
strong influence in a particular age range, and that the 
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genetic effect might not be uniform across a person's life- 
span. The results suggest that past GWAS might have 
captured only a small group of very influential SNPs that 
are effective regardless of age or other environmental fac- 
tors. There might be a lot more SNPs, such as those 
shown in this study, that are "turned on" only in a parti- 
cular age range and remain to be identified. These SNPs 
might fill in the missing heritability in the picture of 
GWAS. 
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